Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add closure-move-bindings RFC #3512

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

SOF3
Copy link

@SOF3 SOF3 commented Oct 10, 2023

As per discussion in #2407, adds the move(y=f(x)) || {}/move(x) || {} syntax.

Rendered

@SOF3 SOF3 force-pushed the closure-move-bindings branch 2 times, most recently from 8edcf1e to 78c402e Compare October 10, 2023 15:07
@ehuss ehuss added the T-lang Relevant to the language team, which will review and decide on the RFC. label Oct 10, 2023
text/3512-closure-move-bindings.md Outdated Show resolved Hide resolved
text/3512-closure-move-bindings.md Outdated Show resolved Hide resolved
SOF3 and others added 2 commits October 11, 2023 08:55
Co-authored-by: Jacob Lifshay <programmerjake@gmail.com>
Comment on lines 129 to 133
> &nbsp;&nbsp; _NamedMoveBinding_ | _UnnamedMoveBinding_\
> _NamedMoveBinding_ :\
> &nbsp;&nbsp; _PatternNoTopAlt_ `=` _Expression_\
> _UnnamedMoveBinding_ :\
> &nbsp;&nbsp; `mut`<sup>?</sup> ( _IdentifierExpression_ | _MethodCallExpression_ )\
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires infinite lookahead for parsing if I am not mistaken, a method call expression can begin with any expression which we cannot differentiate from a pattern without attempting to parse the whole construct first. This is I believe the reason as to why assignment destructuring (pat = expr) doesn't parse full patterns either, it accepts a subset of expressions on the lhs and interprets those are patterns.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As specified below, the MethodCallExpression always starts with an identifier, so this is not a parsing issue right now. But it is indeed problematic if we want to relieve this restriction in the future.

Copy link
Author

@SOF3 SOF3 Oct 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mental model would be to parse it like this:

if peek("mut") { parse("mut") }
if peek(Identifier) && !peek("=") {
    UnnamedMoveBInding { parse(Expr)}
} else {
     NamedMoveBinding { parse(Pattern), parse("="), parse(Expr) }
}

SOF3 and others added 2 commits October 11, 2023 15:13
Co-authored-by: Lukas Wirth <lukastw97@gmail.com>
Co-authored-by: Lukas Wirth <lukastw97@gmail.com>
[guide-level-explanation]: #guide-level-explanation

A closure may capture bindings in its defining scope.
Bindings are captured by reference by default:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not accurate.
Bindings are "captured by example", so if the closure needs a reference to the value it's captured by reference and if needs an owned value it's captured by move...except for Copy types which always capture by reference unless you move.

Issue with more details:
rust-lang/rust#100905

I think we should consider changing this behavior in a new edition and seeing if this helps without needing new syntax.
Personally for me every time I need the workarounds described in this RFC it was because of this.

Copy link
Author

@SOF3 SOF3 Oct 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it sound reasonable and feasible to detect this as a breaking change and fix/warn in automated edition migration tools?


- It is a very frequent scenario to clone a value into a closure
(especially common with `Rc`/`Arc`-based values),
but even the simplest scenario requires three lines of boilerplate:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing some implied boilerplate here, but is this just two lines?

Suggested change
but even the simplest scenario requires three lines of boilerplate:
but even the simplest scenario requires two lines of boilerplate:

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be just one line (and maybe even inlined in another expression). So that's 3.5 lines of boilerplate.

Co-authored-by: teor <teor@riseup.net>
dbg!(foo); // the outer `foo` is still [1] because only the cloned copy was mutated
```

The above may be simplified to `move(mut foo.clone())` as well.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be left as a future extension?

  1. It is just syntactic sugar, having the same effect as move(mut foo = foo.clone()).
  2. It's a bit weird that only a subset of expressions are allowed -- namely, those where the first token is an identifier.
  3. It should possibly be extended so that move(mut self.foo.clone()) yields move(mut foo = self.foo.clone()).
  4. There's no precedent for introducing such bindings. In particular, arguably, it should work similarly in constructors: Self { foo.clone() } should be short-hand for Self { foo: foo.clone() } for consistency.

I think it would be easier to first focus on the idea of explicit capture -- and whether it's worth it -- and defer the special short-hand notation for later, and thus that it would make sense to move this to a possible future extension.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this should be left for future, if added at all.

By looking at move(mut foo.clone()), it is unclear what it should bind to. In this situation, foo is the only ident but in move(mut foo.bar.clone()), is it supposed to be bound to bar or foo? That sort of ambiguity isn't a good thing to add.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's why it is strictly limited to a single identify in the method call receiver.

Comment on lines 148 to 151
the closure expression is of the _ImplicitReference_ type, where
all local variables in the closure construction scope not shadowed by any _MoveBinding_
are implicitly captured into the closure by shared or mutable reference on demand,
preferring shared reference if possible.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to ask for a special sigil for this "and other implicit captures" behavior:

move(x, y, ..) || {}

Indeed, by requiring a trailing .. in order to allow implicit captures:

  • such implicit capture is now clearer: the x, y enumeration is now visibly non-exhaustive;
  • it gives the option not to provide , .., thereby requiring exhaustive enumeration of captures: this gives the power to the person defining the closure to restrict the kind of captures a closure may use.
    • Given that the motivation for this syntax is finer-grained control, it would be a pity not to be featuring such a capability, and have to wait for a later edition-requiring amendment to get into this behavior.
    • reasons for restricting external captures can be:
      • soundness of more advanced designs (e.g., macros and advanced lifetime shenanigans);
      • especially in FFI or other type-erased scenarios where certain lifetimes may have been lost;
      • (or when worried about things like {A,}Rc cycles in case of implicit by-move captures)
      • better control of a closure size;
      • helps teachability of how closures work: move(x) |…| { … } would always be an x-capturing closure, no matter the body of ..., thereby properly decoupling implementation details from API surface (of the closure type).
  • it mirror struct pattern destructuring, wherein a Foo { x, .. } pattern is not expected to be exhaustive over the fields it has, whereas Foo { x } is.

Copy link

@hamza1311 hamza1311 Oct 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, by requiring a trailing .. in order to allow implicit captures:

Do you mean implicit captures by-value? or keeping the current default capture behavior (without move)? Foo { x, .. } means all fields except x are ignored so it would make sense if all fields other than x were ignored in move(x, ..) when doing move captures

Copy link

@danielhenrymantilla danielhenrymantilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big 💯 from me; the one thing I have been having to grant to C++ programmers is the superiority of their explicit lambda-captures notation: it's been long due that Rust have the same. Thanks @SOF3 for taking the time to write this, all while detailing the parsing rules 🙏

  • I have two main tweaks to suggest to the RFC, however. For there to need to use .. to make the captures non-exhaustive (e.g., move(..) || {} or move(x, y, ..) || {}), and for these implicit captures to be captured by, as the keyword itself says, move.

Comment on lines +208 to +209
1. It is more consistent to have `move(x)` imply `move(x=x)`,
which leaves us with implicit references for the unspecified.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that writing move(&some_var) is not that much longer than move(some_var), so this argument may be a bit weak (to clarify, in a vacuum / ceteris paribus, I'm on board with this argument; but in this case, it may not pull its weight compared to the advantages of by-move implicit captures).

Comment on lines +210 to +215
2. Move bindings actually define a new shadowing binding
that is completely independent of the original binding,
so it is more correct to have the new binding explicitly named.
Consider how unintuitive it is to require that
a moved variable be declared `mut` in the outer scope
even though it is only mutated inside the closure (as the new binding).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once this RFC lands, I could imagine there being a lint against implicitly by-move-captured mut bindings, be it through move ||, or through move(..) || 🙂 (warn-by-default for the latter, and the former requiring the edition dance)

are implicitly moved or copied into the closure on demand.

Note that `move` with an empty pair of parentheses is allowed and follows the former rule;
in other words, `move() |...| {...}` and `|...| {...}` are semantically equivalent.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ughh, I have to admit to finding this rather counter-intuitive:
I'd have expected move() || to be closer to move || than to ||

# Rationale and alternatives
[rationale-and-alternatives]: #rationale-and-alternatives

Capture-by-reference is the default behavior for implicit captures for two reasons:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to suggest the opposite, here, actually: for move(..) (or move() if the suggestion for .. is rejected) to mean the same as move.

  • big consistency gain: move(..) || being closer to move || than to ||;

  • make by-reference captures be the odd one that stands out. I agree this may be subjective, but bear with me:

    There are two kind of closures:

    • immediately-called closures, such as with Option::map() or panic::catch_unwind();
    • long-lived closures (: 'long), such as the spawn() family.

    The former, 99% of the time, accomodates to fully implicit || {} capture rules, and the latter, kind of accomodate to move || closures, except for when specific "right-before move adjustments" are needed, which is very often just a .clone() (and sometimes an Arc::downgrade()).

    This very RFC, by providing explicit captures, is thus mostly helping the latter case: the one with : 'long-lived requirements. In such a scenario, capturing by reference is rather non-desired, or something niche stemming from a having long-lived borrowable available (e.g., borrowing a variable but because it would happen to outlive the 'env of a scope::spawn())). It thus does seem rather legitimate to favor by-move captures for the ..-implicit ones, much like move || does.

  • lastly, although I agree this one may not be that important to others, I feel like it would be the most helpful syntax w.r.t. teachability: with the current semantics of implicit by-ref captures for move(..), it would mean having borrows by using the move keyword, which kind of denaturates/dilutes the meaning of that word. Whereas in the case of move(..) implicitly capturing by move, the explicit move(&x) would be justified insofar the word move would have been itself neutered/overriden by the explicit & sigil.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, rather than adding move(..) (which could just be written move) or move() (which could just be omitted), we should have a warn-by-default lint for an empty capture list (with a suggestion that removes it and a note that mentions using move if you want to capture everything), and add it to the list of lints suppressed in macros (since macros might want to generate a capture list).

> _NamedMoveBinding_ :\
> &nbsp;&nbsp; _PatternNoTopAlt_ `=` _Expression_\
> _UnnamedMoveBinding_ :\
> &nbsp;&nbsp; `mut`<sup>?</sup> ( _IdentifierExpression_ | _MethodCallExpression_ )\
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be quite valuable to also have move(&x, &mut y) syntax, for the specific cases of explicit by-reference captures:

  • not only for my suggested implicit-by-move change, should it be accepted;
  • but even in the case of the current implict-by-ref-captures semantics of the RFC, when wanting to be explicit, it would be nicer to be using move(&x, &mut y) than move(x = &x, y = &mut y), especially when x and y are longer_variable_names.
  • there would otherwise be no way to mimic, explicitly, the "magic binding" semantics which implicit by-ref closures captures currently have (e.g., when having x: i32, an implicit by-ref capture of x makes it so the closure has captured &x, but the x name is still fully usable to refer to the *&x place).

If we didn't have this sugar, I could very well see myself having a .by_ref() and .by_mut() couple of blanket-implemented extension methods.

  • whilst these methods could be a bearable alternative, I don't think it's worth the hassle w.r.t. adding another 1-lookahead case for these parsing rules:
    $(mut)? $binding:ident $(. $method…)?
OR
    &$(mut)? $binding:ident
OR
    $binding:pat = $e:expr

Copy link

@hamza1311 hamza1311 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally! After years of discussion, thanks @SOF3 for finally pushing forward and creating the RFC

dbg!(foo); // the outer `foo` is still [1] because only the cloned copy was mutated
```

The above may be simplified to `move(mut foo.clone())` as well.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this should be left for future, if added at all.

By looking at move(mut foo.clone()), it is unclear what it should bind to. In this situation, foo is the only ident but in move(mut foo.bar.clone()), is it supposed to be bound to bar or foo? That sort of ambiguity isn't a good thing to add.

@camsteffen
Copy link

Syntax bikeshed. My brain really wants to parse move(x) as a function call. What if it looked like this?

move { x, y: y.clone() } || foo(x, y);

This syntax is analogous to a struct literal. In that way I think it might be more intuitive, while also less likely to be confused as something it's not (function call) because the lowercase move is an upfront hint that it is not a struct.

@danielhenrymantilla
Copy link

danielhenrymantilla commented Oct 14, 2023

[my .02c] @camsteffen I agree the move() may look a bit too much like a call, making the braced syntax appealing; but the issue that could be pointed out, then, is that it doesn't map that well to the shorthand syntax of x.clone() (or my suggested &var):

move { x, &y, w.clone(), z: foo } |…| {}
  • (it also has the issue of introducing yet another whitespace in this syntax, contrary to () and [] which "stick" to move)

Maybe an in-between, which, for better or for worse, would resemble C++, would be to be using square brackets? 🤷

move[x, &y, w.clone(), z = foo] |…| {}

Be it as it may, I'd be fine with any choice among these three grouping delimiters, on condition that it not preclude the shorthand capture syntax.

@CryZe
Copy link

CryZe commented Oct 14, 2023

async move { x, y, z } { x }

would probably be ambiguous, otherwise I really like that syntax. Though arguably the move should be in front of the async anyway, not sure why it was stabilized this way. Maybe this can be changed on an edition transition to support the move {} syntax.

@nhuurre
Copy link

nhuurre commented Oct 15, 2023

If I may add my two bikeshedding cents, I think the syntax is cleanest when name bindings look like let-statements. That is, instead of a move-clause before the closure I propose a move let as the first statement in the closure body.

|| {
    move let foo = foo.clone();
    foo.run()
}

It is of course potentially confusing that the move let statement appears inside the closure body but is executed immediately on closure construction, not invocation. I consider this analogous to static items that may appear inside the body but are executed at compile time:

|| {
    static X : T = f1(); // f1 called once at compile time
    let foo : T = f2();  // f2 called every time the closure is called
    foo.bar(&X)
}

Hopefully the requirement that move let must be the first statement in the closure makes it sufficiently distinct from the rest of the body even without delimiters.

Also, I don't think omitting the let causes any ambiguity, i.e. the syntax could be just move $PATTERN = $EXPR; instead of move let $PATTERN = $EXPR;. You could write move x; as shorthand for move x = x;.

The existing syntax of move before the closure

move || {
    x + y
}

could be interpreted as shorthand for listing every implicitly captured name:

|| {
    move (x,y);
    x + y
}

@SOF3
Copy link
Author

SOF3 commented Oct 15, 2023

@nhuurre I think the case for static is a bit different, in that static initializers are always const expressions so it does not matter when they are executed, but move initializers may have side effects and the execution time matters. After all, static expressions are executed during compile time instead of during "static constructors" (or whatever relevant concept) as you put it akin to.

@SOF3
Copy link
Author

SOF3 commented Oct 15, 2023

@camsteffen maybe it's just me, but move { foo } || foo() looks like an expression joining a struct literal and a function call with a logical or as well.

That said, it has never occurred to me (maybe it does to those unfamiliar with Rust) that the following looks like a logical or expression:

move || foo()

As with @danielhenrymantilla's comment, it is equally likely to read move[foo] || foo() as a logical or expression with an array index.

Maybe it's because almost every syntax highlighter (no matter semantic or lexical) always highlights the move keyword? In that case, it would be obvious that move() is not a function call because it involves a keyword, as much as break (a, b) wouldn't cause ambiguity that we are calling a function called break with arguments a and b. From this perspective, perhaps it's better to leave a space between move and the ( in the style guide?

```rs
let foo = 1;
let mut bar = 2;
let mut closure = move(mut foo) || {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed that GitHub renders move() with the color of a function call instead of as a keyword. This adds to the previously discussed problem of ambiguity as a function call.

@nikomatsakis
Copy link
Contributor

I have previously written a draft RFC around adding capture clauses -- I still like that design. I'd love to see it compared/contrasted with this one (haven't had time to fully read the RFC yet), as well as to know whether this one achieves the various bits of motivation.

One obvious difference is that I did not support move(x = y+1), though that would be an easy extension. I did support move(x.clone()) which would effectively be shorthand for move(x = x.clone()).

@SOF3
Copy link
Author

SOF3 commented Nov 19, 2023

Based on the comments above, I am planning to separate the RFC into two features:

  • closure_move_bindings:
    • Introduce the syntax move(foo = bar) || expr and move(mut foo = bar) || expr, where foo is an ident and bar and expr are expressions. This is just an alternative syntax for { let mut foo = bar; || expr} with the appropriate move semantics.
    • Introduce the syntax move(foo) || expr and move(mut foo) || expr, where foo is an ident. This is an alternative syntax for move(foo = foo) || expr.
  • closure_move_binding_method_call: Introduce the move(foo.bar()) || expr syntax, where foo and bar are idents. This is an alternative syntax for move(foo = foo.bar()) || expr.

@SOF3
Copy link
Author

SOF3 commented Nov 23, 2023

Another issue: Is it better to use : instead of = to separate the pattern and the expression? That would be more consistent with the field: expr syntax in struct literals, which would make more sense if field: field can be shortened into field. Or is it better to reserve : for type annotations in the future?

@danielhenrymantilla
Copy link

@SOF3, regarding name: value vs. name = value:

  • it does have a nice "symmetry" with braced struct syntax, assuming braces being used as delimiters;
  • this, in turn, "curses" the syntax not to have too much feature creep, lest there be inconsistencies between braced struct syntax and this closure capture one:
    • move(&x, &mut y) would have to become move { ref x, ref mut y } (and there seems to be kind of an implicit but general consensus that ref [mut] is confusing for newcomers and something which new syntax should try to avoid).
    • what about move(var.method())? No such braced syntax.

So even if the different shorthands end up split as follow-up features / RFCs, if keeping them in mind, we may actually want to avoid var: value syntax for the captures of this RFC

@SOF3 SOF3 mentioned this pull request Dec 7, 2023
Co-authored-by: Josh Triplett <josh@joshtriplett.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet